Goto

Collaborating Authors

 nullnull null








The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Neural Information Processing Systems

Such embeddings induce the so-called maximum mean discrepancy (MMD; [Smola et al., 2007, Gretton et al., 2012]), which quantifies the discrepancy Many estimators for HSIC exist. The classical ones rely on U-statistics or V -statistics [Gretton et al., 2005, Quadrianto et al., 2009, Pfister et al., 2018] and are known to converge at a rate of Lower bounds for the related MMD are known [Tolstikhin et al., 2016], but the existing analysis considers radial kernels and relies on independent Gaussian distributions.



Appendices

Neural Information Processing Systems

Appendix A provides derivations supporting Section 3 in the main paper. In this section we provide detailed derivations of the ST -DGMRF joint distribution, for both first-order transition models (Section A.1) and higher-order transition models (Section A.2). A.1 Joint distribution The LDS (see Section 2.2 and 3.1 in the main paper) defines a joint distribution over system states First, note that Eq. (1) can be written as a set of linear equations x We make use of this property in the DGMRF formulation and in the conjugate gradient method. Eq. 11 is converted into a discrete-time dynamical system by approximating ρ We consider two ST -DGMRF variants that capture different amounts of prior knowledge. DGMRF transition matrices can be parameterized accordingly. The air quality dataset is based on hourly PM2.5 measurements obtained from [ The raw PM2.5 measurements are log-transformed and standardized to zero mean and unit Ca. 50% of the nodes are masked out (purple nodes within We use a simple MLP with one hidden layer of width 16 with ReLU activations and no output non-linearity. The DGMRF parameters are not shared across time, allowing for dynamically changing spatial covariance patterns.